changed default value of zeroing-threshold in BackpropTruncationCompo…#1240
changed default value of zeroing-threshold in BackpropTruncationCompo…#1240danpovey merged 1 commit intokaldi-asr:fast_lstmfrom
Conversation
…nent to 15; updated the results on AMI
| BaseFloat clipping_threshold = 15.0; | ||
| BaseFloat zeroing_threshold = 2.0; | ||
| BaseFloat clipping_threshold = 30.0; | ||
| BaseFloat zeroing_threshold = 15.0; |
There was a problem hiding this comment.
Larger values of these quantities are more dangerous, i.e. more likely to lead to instability.
I don't think it's sufficient to just test this on one setup, because it's the potential for divergence that this is supposed to guard against. Have you done any other tests?
|
Also, this PR would need to be on top of the 'fast_lstm' branch-- there are
other LSTM config-generation objects there that would have to be changed.
But I want you to test it in that setup. And I'd be more comfortable with
smaller thresholds, like 5 and 15 or 5 and 20, instead of 20 and 30, if
there is no clear difference in results. It's safer in situations where
divergence is a possibility. The WER improvements you had in the RESULTS
file were rather unimpressive.
Dan
…On Thu, Dec 1, 2016 at 11:42 PM, Yiming Wang ***@***.***> wrote:
…nent to 15; updated the results on AMI
------------------------------
You can view, comment on, or merge this pull request online at:
#1240
Commit Summary
- changed default value of zeroing-threshold in
BackpropTruncationComponent to 15; updated the results on AMI
File Changes
- *M* egs/ami/s5b/RESULTS_ihm
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-0> (5)
- *M* egs/ami/s5b/RESULTS_sdm
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-1> (5)
- *M* egs/wsj/s5/steps/libs/nnet3/xconfig/lstm.py
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-2> (16)
- *M* egs/wsj/s5/steps/nnet3/components.py
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-3> (4)
- *M* egs/wsj/s5/steps/nnet3/lstm/make_configs.py
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-4> (2)
- *M* src/nnet3/nnet-general-component.cc
<https://github.com/kaldi-asr/kaldi/pull/1240/files#diff-5> (4)
Patch Links:
- https://github.com/kaldi-asr/kaldi/pull/1240.patch
- https://github.com/kaldi-asr/kaldi/pull/1240.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1240>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu8mTeUUEuaIERTq7Hd83XVmtbsGjks5rD6G9gaJpZM4LCMeq>
.
|
|
This PR is already on top of fast_lstm. The old WERs reported in RESULTS are obtained without zeroing (i.e. using ClipGradientComponent as the comment said). The results of tuning zeroing-threshold on ihm are: I have not tuned it on sdm1 After the fix of max-deriv-time, the gradient explosion did not happen even on the babel georgian multicondition data (which had the most severe problem before the fix): when I disabled the zeroing, the clipped-proportion is at most ~0.004. I also tuned the zeroing-threshold on swbd blstm_6i: The reason I chose 15.0 as threshold rather than 5 or 10 is mainly based on the WER on swbd. Not sure how much variation there could be for different runs with the same settings |
|
OK I will think about it. |
|
FYI, added the zeroed-proportion stats of the 1st layer at the last iteration, which shows how often zeroing was activated: ami ihm swbd |
|
OK, I'll merge this. |
…nent to 15; updated the results on AMI